Load in packages

## Acquire demographic data on tennis players

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.1 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(stringr)
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout
library(scales)
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
#library(ggradar)

1) - Tennis Demographics Data

Note: The player demographics data contains 560602 players

Plot country demographics of ATP mens players

So, the US, Spain, Australia, Germany, and Italy have had a lot of male players throughout history play in professional tennis.

2) - Create a visualization that allows me to plot the proportion of service games won and the proportion of opponent service games broken for Grand Slams in 2021

Let me explain in plain English. Each game (first to 5 points but have to win by 2) one player does all the serving. Then the the other player serves the next game. Thus, we have: \[\text{Proportion of Service Games won} = \frac{\text{# of service games the player won}}{\text{# of games the player served}} \] Likewise, we have: \[\text{Proportion of Opponent Service Games broken} = \frac{\text{# of non service games the player won}}{\text{# of opponent service games}} \]

3) - Plot Probability of winning a server over time

Plot probability of winning on the serve for the Big 3 (Nadal, Djokovic, and Federer) by season.

To calculate probability of winning on the serve use the following formula: \[\text{P(Winning on Serve)} = \frac{\text{# of times winning on 1st serve + # of times winning on 2nd serve}}{\text{# of serves}} \] Thus, P(Winning on Serve) is often computed at the game level, but can be extended to the season level by adding up these numbers for that player for the given year, which is what I do here.

4) - Create a radar plot

Most of these are quite low as expected execpt for carpet. However, it turns out each player only had a very small number of games on carpet which would explain why the Ace Service rate is so high for carpet.

5) - Violin Plots of Height (in cm) by Handedness

Check how many of each there are.

##   right left     U missing
## 1 15444 1395 39519     243

The demographics list contains any player who ever played on the ATP tour. Thus, some of the oldest playes have date of births around 1913. Thus, there are 56,602 players in that file. Given that U occurs mainly in the older players, I think U means the handedness of that player is “Unknown” but I’m not sure. And then some are actually missing which goes in a violin plot with no label. Overall, I will stick to using more recent data, from say the 1990’s and beyond.